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COMPUTER PROGRAMS 


A niaximum likelihood classification capability has now been developed, 
and put into operation. .The flow scheme for this program was developed by 
Fernando Esparza, in consultation with G. L. Thomas, and the programming was 
done by James J. Millard. 

Three programs have been written. 

(1) Classification Program. 

.The principal feature of this program is the maximum likelihood 
classification procedure, which is based on the mathematical method, out! ined 
by Swain. ^ Since this method is rather costly in terms of computer time, 
two other alternatives were added to provide perhaps less exact results at 
reduced computer time. They are: (a) classification according to the 

class which has its centroid the least Euclidean distance from the point 
being classified, and (b) choosing the three nearest classes, by the above 
distance measurement and then using the maximum likelihood method to make 
the classification from among those three classes. For purposes of. dis- 
cussion, we shall refer to these three options as MAXLI'K., MI.NDIST, and 
MAX/MIN, respectively. 

After any one of the above analysis methods is concluded for each data 
sample, the appropriate class character is assigned for mapping purposes. 
Concurrently, a character count for each class determined is maintained for 
tabulation and use during analysis of the requested rectangular area under 
investigation. Lip to eight classes may be requested to describe any area. 


Philip N. Swain. Pattern Recognition: A Basis for Remote Sensing Data 

Analysis. 

LARS Information Note 111572 (9/10/73) 
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(2) Matrix Tape File Generation program 

This program computes the necessary parameters, needed for the 
maximum likelihood technique from training data samples of various land- 
use classes as observed in the four MSS bands. The parameters Include the 
covariance matrix for each class and the respective matrix determinant 
and centroid values of each band for each class established. The parameters, 
then are recorded on a magnetic tape for use in the other programs described 
Different files have been created for various areas under investigation 
as well as for each ERTS data set used due to differing radiance values 
caused by differing atmospheric conditions and sun angles. 

(3) Character Counting Program 

This program yields the character count .and fraction of the total for 
each class with the added capability to examine any polygonal shaped area, 
not feasible in the Maximum Likelihood program. The logic within the 
Maximum Likelihood program is applicable, except for omission of mapping 
output. 

It is appropriate at this point to acknowledge the contributions of Jay 
Millard. In addition to doing the bulk of the computer programming, as indicated 
by the title page, he also has handled the input parameters for the various 
computer runs as requested by the co-investigator, and has made numerous 
innovations to the programs and participated in the interpretation of results. 

COMPUTER TIME REQUIREMENTS 

The time-saving value of the two alternate methods is indicated by the. 
following figures on central processor time on the G. E. 635 computer for 
six-class maps of the Titusville area (72,240 pixels, 129 mi. 
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METHOD 


- TIME 


MAXLIK 16*9 Min. 

MiAX/MIN 12.6 

. MINDIST 5.8 

ACCURACY OF RESULTS 

As indicated above, the program computes the fraction of the total area 
represented by each class being considered. When these results are compared 
for the three alternatives, some differences are foundjas indicated, by the 
figures of Table 1. Those figures apply to the total region shown in the MAXLIK 
.map of Figure 1. Study of Table 1 shows that the minimum distance method, 
compared to the maximum likelihood method shows significantly reduced residential 
area and slightly increased water, marsh, and undeveloped .areas. Inan attempt 

to understand the differences, plots have been made of the histograms and 
centroid locations for the training samples of the six classes, and are given 
In Figures 2 to 6.' Some basis for the higher number of residential choices 
made by the maximum likelihood method can be seen in the wider spread of the 
residential histograms as compared to the histograms for the v/ater, marsh, and 
undeveloped classes. Of course, it needs to be kept in mind that the training 
samples were carefully chosen and a test points does not necessarily fit neatly 
into a classification. An example of this is a sector which was classified as 
undeveloped by the minimum distance method and as residential by the maximum 
likelihood method. The sector actually is undeveloped with relatively sparse 
vegetation so that some sand shows through. 

In an attempt to evaluate the accuracy of the three classification 
methods, a check was made of individual characters (pixels) by simple random 
sampling using a random number table to choose the line and sample number for 
samples within the solid lines of Figure 1. Aerial photography (color infrared 
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TABLE I 



CHARACTER 

REPRESENTATION 


FRACTION OF TOTAL 


CLASS 

MAX LIK 

MAX/MIN 

MIN DIST 

water 

bl ank 

34% 

34% 

37% 

undeveloped 

» 

26% 

25% 

28% 

marsh 

S 

20% 

22% 

22% 

residential 

/ 

CO 

17% 

10% 

commercial 

X 

1% 

1% 

2% 

industrial 
new construction 
bare sand 

B 

2% 

1% 
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and panchromatic) was the primary source of ground truth information, reinforced 
by local knowledge and the Titusville City Planner's land use map. One percent 
of the total number of points was checked. The results are given in Table 2. 

The overall accuracy figures can be changed appreciably by the choice of 
boundaries - in this case by the amount of marsh and open water included within 
the boundaries - but the relative accuracies of the three methods are not 
changed. If the "safe" classes of open water and marsh are excluded, the 
accuracy figures for the three methods become 90, 87, and 89 percent, respectively. 

In the sampling results, 56 points were not counted as either correct 
or incorrect because of the difficulty of making a clear-cut decision. Most 
of these uncertainties were due to geometric uncertainties at class boundaries; 
others applied to locations which were mixtures of tv/o cl asses. Points in this 
uncounted group included: 

Golf courses usually were classified as residential. 

One high density residential (apartment) point was 
classified as commercial. 

Major highways were usually classed as residential. 

Five points were .identified as marsh by MlflOIST and 
as undeveloped by I’4AXLIK and MAX/MIN. The states 
of those points at the time of the ERTS pass is not 
known. 

Schools and school yards were usually classed as 
residential . 

Hov; each uncertain points are counted and how much "safe" area is 
included in the area under . consideration have important effects on accuracy 
figures. For these reasons and since the results quoted here apply to only one 
region and the number of training samples is small (18 to 42 per class), the 
accuracy figures should be regarded only as rough indicators. 
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TABLE 2 


CLASS . 


MAX LIK 



MAX/MIN 



MINDIST 



Number 

Correct 

Number 

Incorrect 

% . 

Correct 

Number 

Correct 

Number 

Incorrect 

% 

Correct 

Number 

Correct 

Number 

Incorrect 

% 

Correct 

Water 

40 

0 

100^ 

40 

0 

100% 

40 . 

0 

100% 

Marsh 

31 

0 

100- 

29 

0 

■ 100 

29 

0 

100 

Undeveloped 

81 

8 

88 

83 

n 

88 

90 

4 

96 

Residential 

29 ' 

1 

97 

30 

0 

100 

25 

4 

86 

Commercial 

1 

1 

50 

1 

1 

50 

2 

0 

100 

Industrial - new 
construction - bare 
sand 

7 

0 ■ 

. 100 

2 

5 

29 

1 

4 

20 

OVERALL 

189 

13 

94 

185 

■ 17 

92 

187 

14 

93 ■ 
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A few other points are worth noting: 

Urban classes can be expected to be less homogeneous than, say, agricul- 
tural classes and, therefore, be more difficult to classify accurately. 

Citrus varies widely in spectral pattern due, at least fn part, to 
variations in the amount of sand seen between the trees. In these maps., citrus 
was sometimes classed as residential. 

Separation of commercial and industrial is, in tliis case, not reliable. 
Reliable results are obtained if they are combined into a single class. 

Some of the B’s shown on Figure 1 are bare sand in fairly new residential 
areas with sparse vegetation. Some others represent areas denuded of vegetation 
by motorbikes. 

At this time we are not able to separate. industrial , new. construction, . 
and bare sand. 'The spread of the histogram for this single class is large. 
Perhaps additional training data will help. 

TITUSVILLE 

The city limits of Titusville are shown as the dashed line, on Figure 1. 

A character count for the City of Titusville, uncorrected for bad scan lines, 
gives the results shown in Table 3, along with data prepared by conventional 
methods by the Titusville Planning Department. 

The total areas for the City are in good agreement, indicating that 
character-counting is a suitable method of determining areas. 

The ERTS map, since it is based on physical characteristics, provides 
some information not normally available on planners' conventional land-us.e maps- 
water and marsh areas, in this case. Table 2 indicates that the figures for 
marsh area are quite reliable; and the figures for water normally are quite 
reliable, but In this case there is some uncertainty due to the uncertainty in 
the fit of the city boundary to the river shore line. 
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On the other hand, the conventional land-use map, based on economic, 
political and social factors, provides some classifications which an ERTS map 
cannot provide, e.g., utilities, public, and institutional uses.- Thus, a 
complete correspondence between the tv/o sets of figures is not possible. Soma 
classes can be grouped for comparison, as shown 1n‘ Table 3. 

The residential area on the ERTS map includes essentially everything in 
the residential area: residences, streets, vacant lots, golf courses, schools, 

and other features with similar spectral characteristics and, therefore,, must 
be compared with item 7 from the planners' figures. The planners* residential 
figure applies only to lots containing residences. 

As pointed out above, the ERTS map does not give reliable separation 
between commercial and industrial classes; hence, for comparison purposes, those 
two classes are combined into a single class. This figure is approximately 
twice the planners' figures. This difference can be explained, on a basis of 
recording regions v/hich are hot commercial or industrial but nevertheless have 
high light reflectance, e.g., apartment houses, apartment house parking lots, 
and regions with barren vegetation. B's on the computer map which were believed 
to be due to bare sand V'/ere subtracted from the character count to give the 
industrial figures quoted. This says that Titusville has 234 acres of bare 
sand of which 86 acres are denuded of vegetation by motorbike usage. 

For the same reasons that the ERTS residential figure is higher than the 
corresponding planners* figure, the ERTS area for (water and marsh and un- 
developed) is lower. 

A 1973 conventional land-use map of Titusville is included as Figure 7, 
for comparison with Figure 1. • 
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TABLE 3 



CLASS 

TITUSVILLE 

PLANNING DEPARTMENT 

j 

ERTS 

MAP 


PER-CENT 

DIFFERENCE 




AREA 

% OF 


AREA 

% OF 




ACRES 

HECTARES 

TOTAL 

ACRES 

HECTARES 

TOTAL 


(1) 

Water 





34 

14 

0 


(2) 

Marsh 





300 

120 

3 

1 

(3) 

Undeveloped 





3573 

1429 

33 


(4) 

Residential 

2222 

. 888 

21 


5744 

2298 

53 1 


(5) 

Commercial 

329 

131 

3 


424 

170 

4 

+29 

(6) 

Industrial 

68 

27 

1 


467 

187 

4 

+587 

C7) 

Residential , Streets , 
Rights-of-VJay , Utilities 
Transportation, Public, 
Institutional 

4611 

1844 

43 . 

(5744) 

2298 


+25 

(8) 

Commercial and 
Industri al 

397 

159 

4 


891 

356 

8 

+124 

(9) 

Water, Marsh and 
Undeveloped 

5730 

2292 

53 


3907 

1563 , 

36 

-32 

(10) 

Residential and 
Highv/ay 





5809 

2324 

54 


(11) 

Bare Sand 





234 

94 

2 



Total 

10,738 

4295 


10,841 

4336 


+1 
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figure 7 
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